Mobile DNA — Latest Matching Preprints

1

Recurrent LINE 1 exonization drives transcriptome remodelling in NSCLC

Parida, A. S.; Kumar, A.; Tiwari, B.

2026-04-24 genomics 10.64898/2026.04.22.720055 medRxiv

Top 0.1%

29.7%

Show abstract

The only autonomously active transposable elements in the human genome are Long interspersed nuclear element-1 (LINE-1) elements. These elements are known to play an important role in changing the transcriptome. LINE-1 sequences affect gene regulation during post-transcription processing, along with their established role in retrotransposition. Exonization is one mechanism where the LINE-1 integrated genome undergoes alternative splicing to produce new isoforms of transcripts. Our work mainly highlights the effect of LINE-1 associated exonization, focusing on the formation of isoforms of transcripts. Using Non-small cell lung cancer (NSCLC) as a model, we conducted a detailed transcriptome study that combines splice junction profiling with gene expression data. Our results show that LINE-1 sequences are often included as exons in host transcripts, leading to the formation of new exons and their various isoforms. The events are validated by solid splice junction evidence that proves the reliability and reproducibility. In particular, it was observed that repetitive analyses revealed certain LINE-1 exonization events that were consistent. The finding indicates that LINE-1 act as recurrent sources of splice ready sequences. Though exonizations do not necessarily affect the total expression levels of genes, our study reveals that they certainly contribute to transcript diversity. The diversity of isoforms generated potentially contributes to the effects of gene function. This study is limited to NSCLC, but it is likely that the exonizations events play a crucial role in the altering RNA diversity in cancers. Therefore the study elucidates new insights into how transposable elements modify gene structure and function during cancer development.

2

Active and unusually expanded PIF/Harbinger transposable elements in the Caenorhabditis inopinata genome

Sato, K.; Jin, X.; Oomura, S.; Kawahara, K.; Sun, S.; Yoshida, A.; Haruta, N.; Sugimoto, A.; Kikuchi, T.

2026-05-29 genetics 10.64898/2026.05.26.728016 medRxiv

Top 0.1%

18.4%

Show abstract

BackgroundTransposable elements (TEs) serve as powerful drivers of genome innovation but also threaten genome integrity. The PIF/Harbinger superfamily is distinctive among DNA transposons because mobilisation typically requires proteins, a DDE transposase and a MADF DNA-binding protein. Caenorhabditis inopinata, the closest known relative of C. elegans, has a TE-rich genome and lacks multiple components of the ERGO-1-class endogenous small-RNA pathway, making it a useful system for examining TE dynamics in a distinct host context. We identified a spontaneous dumpy mutant of C. inopinata caused by insertion of a PIF/Harbinger-family element into the coding region of Cin-dpy-11. The inserted element, designated Harbinger-1M_cIno, belongs to the Turmoil2 lineage originally defined in C. elegans and retains a MADF domain but lacks a recognisable DDE transposase ORF. Genome-wide curation recovered 258 related copies, revealing a strongly asymmetric family structure. Short noncoding derivatives were predominant, MADF-bearing derivatives were expanded and only one DDE-bearing locus retained an apparently intact transposase gene, suggesting that DDE and MADF functions are partitioned across distinct elements and may be supplied in trans during mobilisation. We also identified a second PIF/Harbinger-derived family, Harbinger-2M_cIno, associated with the Turmoil1 lineage. This family comprises 1,376 copies and therefore records substantial past amplification, but it lacks a detectable DDE source, shows greater sequence divergence and more degraded terminal structures than Harbinger-1M_cIno. Together, these data indicate that the two PIF/Harbinger lineages in C. inopinata differ not in whether amplification occurred, but in when it occurred and whether present-day mobilisation competence has been retained.

3

Long-read sequencing reveals transposable element-derived chimeric transcripts at zygotic genome activation in mammalian embryos

Kawakami, S.; Kitao, K.; Ikeda, S.; Honda, S.

2026-05-28 developmental biology 10.64898/2026.05.25.727629 medRxiv

Top 0.1%

14.7%

Show abstract

BackgroundTransposable elements (TEs) are mobile genomic sequences that constitute one-third to one-half of the mammalian genome. Recently, TEs have been recognized for their important roles as cis-regulatory elements. TEs are broadly activated during zygotic genome activation (ZGA) in mammalian embryos, where they function as alternative promoters of host genes and drive the transcription of chimeric transcripts. However, the construction of comprehensive chimeric transcript databases based on short-read sequencing remains limited due to the repetitive and abundant nature of TEs in the genome. Here, we used long-read RNA sequencing to construct a comprehensive dataset of chimeric transcripts expressed in ZGA mouse and bovine embryos. ResultsWe identified 11,996 and 4,755 chimeric transcripts variants derived from 2,695 and 1,200 host genes in mouse and bovine, respectively, exceeding the numbers reported in previous short-read-based studies. Among them, 114 orthologous pairs produced chimeric transcripts in both species. Gene Ontology analysis revealed significant enrichment of terms related to transcriptional regulation and protein modification in mouse, whereas no terms were significantly enriched in bovine. Assessment of the protein-coding potential of the TE-driven transcripts using predicted open reading frames (ORFs) revealed that the proportion of "Protein-coding" transcripts was lower, whereas that of "LncRNA" (long non-coding RNA) was higher compared with all transcripts in both species. Among the ORFs classified as "Protein-coding", comparison with canonical ORFs revealed a tendency for the N terminus to be truncated while the C terminus remained intact in both species. TE-derived promoters used in mouse were enriched for mouse-specific TEs, whereas those in bovine were enriched for older TEs conserved among eutherians. In addition, long-read sequencing detected a greater number and proportion of TEs used as promoters in mouse and bovine than short-read sequencing. Although motif analysis identified KLF5 and OTX2 binding sites upstream of TE-derived promoters in both species, the specific TEs containing these motifs differed between the two species. ConclusionsThis study presents the first long-read sequencing analysis of chimeric transcripts in mammalian embryos in two species. Our approach revealed the functional similarities of chimeric transcripts between species, as well as species-specific differences in their TE compositions.

4

TEDEdb: a large-scale resource and multi-cohort analysis of transposable element differential expression in cancer

Calendo, G.; Chaunzwa, M.; Dehzangi, I.; Madzo, J.; Issa, J.-P. J.

2026-04-29 cancer biology 10.64898/2026.04.27.721042 medRxiv

Top 0.1%

12.3%

Show abstract

The human genome consists of nearly 50% repetitive DNA, referred to for decades as "junk DNA". These repetitive sequences, usually under the strict control of epigenetic silencing, have been observed to be aberrantly expressed in cancer. Some of these expressed sequences, e.g., transposable elements (TEs), can induce innate immune responses when de-repressed following treatment with epigenetic therapies. As a result, epigenetic therapy has been suggested to augment cancer therapies. TEs are traditionally ignored in most RNA-seq studies and their expression is often excluded from publicly available data sources. Thus, the vast amount of publicly available RNA-seq data is an untapped resource for exploring the role of TE expression in cancer and cancer treatment. Here, we present a uniform re-analysis of over 7,000 RNA-seq samples, encompassing more than 2,000 differential expression experiments across 220 cancer cell lines and 700 drug treatments. We observed that TE expression is more prone to batch effects than gene expression alone, necessitating the use of meta-analysis techniques to probe the dataset for global trends. We confirm that DNMTi and HDACis are powerful inducers of TEs. We also show that non-epigenetic compounds such as CDK and topoisomerase inhibitors can also induce robust up-regulation of transposable elements and confirm that this TE induction is consistent with viral mimicry response. We make all of the reprocessed data, web application, and database publicly available at: https://dataexplorer.coriell.org/TEDEdb/

5

Evaluating the potential role and contribution of transposable elements to the evolution of microbial multicellularity across the tree of eukaryotes

Correa Perdomo, A. X.; Brown, M. W.; Banson, I.; Robert, J. E.; Thompson, C.; Kalulu, P.; Tice, A. K.; Ray, D. A.

2026-06-25 genetics 10.64898/2026.06.24.734286 medRxiv

Top 0.1%

6.8%

Show abstract

Multicellularity has evolved multiple times across the eukaryotic tree of life, including among protist lineages. Because transposable elements (TEs) strongly influence genome architecture and gene regulation, understanding their potential impact on genome structure and their relationship with gene expression may provide insight into the evolution of multicellularity. Here, we generated a new genome assembly for the facultatively multicellular amoeba Acrasis kona and performed comparative analyses of TE composition, TE diversity, and TE-density organization across diverse protist lineages. Comparative analyses included unicellular and multicellular representatives from across the tree of eukaryotes, (Heterolobosea, Filasterea, Cristidiscoidea, and Chlorophyceae), including Naegleria spp., Tetramitus jugosus, Capsaspora owczarzaki, Pigoraptor spp., Fonticula alba, Parvularia atlantis, Volvox carteri, and Chlamydomonas reinhardtii. To examine relationships between TEs and gene regulation, we integrated transcriptomic datasets from A. kona, Capsaspora owczarzaki, and Volvox carteri with genome-wide TE-density analyses of differentially expressed genes. TE abundance and composition varied substantially among lineages, with species that exhibit more complex developmental or cellular organization generally containing higher TE proportions than closely related unicellular taxa. Patterns of TE-density organization near up-regulated, down-regulated, and non-differentially expressed genes also differed among systems, ranging from strong TE depletion in A. kona to weaker or cell-type-specific patterns in Capsaspora and Volvox. Together, these findings suggest that transposable elements are associated with multicellularity across diverse protist lineages, although the specific roles they play appear to be complex, lineage-specific, and not yet fully understood.

6

Lifestyles of Gypsy-family transposons shape their regulatory mechanisms

Papameletiou, A.-M.; Czech Nicholson, B.; Bornelöv, S.; Hannon, G. J.

2026-05-21 genomics 10.64898/2026.05.19.726053 medRxiv

Top 0.1%

6.6%

Show abstract

Transposable elements are a highly diverse group of selfish genomic elements, prevalent across the tree of life, whose uncontrolled propagation poses a threat to genome stability. Recent studies have explored the evolution of Drosophila melanogaster transposable elements, their co-evolution with the host genome, and mechanisms that regulate their activity. However, little is known about their cross-species evolutionary patterns. Long terminal repeat (LTR) retrotransposons are the most active group of transposable elements in Drosophila. They are broadly separated into retroelements, which are active in the germline, and insect endogenous retroviruses that are active in the soma. Somatic elements are hypothesised to infect the germline through their acquisition of virus-derived proteins such as Envelope and sORF2, thus multiplying through successive generations. In this study, we curated the sequences of LTR retrotransposons in 249 drosophilid genomes, allowing us to study their evolution across these species and highlight their varying degrees of conservation. Furthermore, we reveal multiple instances of Envelope protein loss or inactivation that suggest shifts in the expression pattern of these transposons, likely accompanied by adopting different transcriptional control mechanisms. We contrast this with the evolutionary history of sORF2, which we found to be much more stable. Lastly, we examined variations in transposon LTR regions responsible for transcriptional regulation and use predictive modelling to suggest six transcription factors likely involved in their tissue-specific expression. Altogether, we reveal complex, interspecies evolutionary patterns of Gypsy-family LTR retrotransposons and highlight examples of their co-evolution with their host genome.

7

Evolutionary genomics of host-transposon conflict, multilevel selection, and Red Queen dynamics

Parija, M.; Patra, S.; Dahanukar, N.

2026-07-03 evolutionary biology 10.64898/2026.07.01.735732 medRxiv

Top 0.1%

5.4%

Show abstract

Transposable elements (TE) jump from one genomic locus to another. Since increase in their copy number is a metabolic burden for the host, TE are considered as genomic parasites. Although host-TE co-existence is regarded as an evolutionary arms race, the hypothesis is not extensively tested especially using evolutionary genomics. We provide a hypothesis testing framework to understand the distribution of TE in genic regions of the host genome, variation in the regulation of TE by host, and effect of these two factors on host-TE co-evolutionary dynamics. We test our hypothesis by understanding the distributions of potentially active TEs in the genome of 78 teleost fishes, representing major families and orders within the clade. Our analysis reveals coevolutionary arms race predicted by the Red Queen dynamics.

8

EDTA v2: enabling scalable TE annotation in animal genomes

Ou, S.; Lu, T.; Nguyen, H.; Gerhardt, K.; Fang, N. F.; Rashid, U.; Guhlin, J.; Dainat, J.; Bao, Z.; Bayer, P. E.; Na, Y.; Benson, C.

2026-07-06 genomics 10.64898/2026.07.01.735963 medRxiv

Top 0.1%

5.3%

Show abstract

The Extensive de-novo TE Annotator (EDTA) automates transposable element annotation in plant genomes but lacks direct LINE/SINE detection, limiting its applicability to animal genomes. We present EDTA v2, which integrates LINE and SINE detection, completely rewrites TIR-Learner for deployability and scalability, and accelerates structural detectors by up to two orders of magnitude. Tested in 30 animal genomes from the Vertebrate Genomes Project Phase I, EDTA v2 bridges the non-LTR detection gap that has prevented automated TE annotation in animals.

9

Biochemical and kinetic properties of a Type III restriction-modification enzyme Mbo45V from the host-adapted pathogen Mycoplasma bovis

Ahmed, I.; Singh, A. P.; Chauhan, O. P.; Bhagat, K.; Gopinath, A.; Saikrishnan, K.

2026-05-04 biochemistry 10.64898/2026.05.01.722158 medRxiv

Top 0.1%

5.0%

Show abstract

Type III restriction-modification (RM) enzymes are prominent bacterial defense against bacteriophage and invading foreign DNA that also modulate the hosts epigenetic landscape. Genome analysis of the host-adapted Mycoplasma bovis PG45 that has a very small genome revealed a Type III RM locus comprising one res and three mod genes. We characterized Mbo45V, a representative enzyme encoded by this locus. The enzyme forms a heterotrimeric complex consisting of two Mod subunits and one Res subunit. Mbo45V recognizes the asymmetric sequence 5'-YAATC-3' (Y = T/C) and cleaves DNA having at least two head-to-head oriented sites [~]26-28 bp away from the recognition site. Methylation of the second adenine of the target site using cofactor S-adenosylmethionine (SAM) protects DNA from restriction, while the SAM analogue sinefungin enhances DNA binding and cleavage. Kinetic studies reveal that Mbo45V exhibits relatively weak DNA binding affinity and an unusually high Km for SAM, indicating low cofactor affinity compared to prototypical enzymes such as EcoP15I. ATPase activity is strongly stimulated by cognate DNA and is inhibited upon methylation of the substrate, suggesting a regulatory interplay between methylation and restriction functions. Comparative analysis indicates that, although Mbo45V shares core mechanistic features with prototypes from Escherichia coli, its kinetic parameters are distinct. These differences likely reflect adaptation to the stable intracellular environment of M. bovis, in contrast to the fluctuating conditions encountered by the enteric bacteria.

10

HERVs as building blocks of RNA regulatory architecture in the human genome

Montserrat-Ayuso, T.; Pujol, A.; Esteve-Codina, A.

2026-05-01 genomics 10.64898/2026.04.29.721355 medRxiv

Top 0.1%

4.4%

Show abstract

Human endogenous retroviruses (HERVs) comprise nearly 8% of the human genome and have contributed extensively to gene regulatory evolution. However, their roles in RNA-centered regulatory processes remain poorly characterized. Here, we present a genome-wide annotation of RNA regulatory features embedded within HERV internal regions and long terminal repeats (LTRs), revealing that HERV sequences act as pervasive components of the human transcriptome. Systematic analysis of RNA-binding protein (RBP) motifs uncovers structured, family-specific regulatory architectures, with distinct RBP signatures distinguishing major HERV subfamilies. Notably, HERVH elements are enriched for RBPs associated with developmentally regulated RNA processing, whereas HERVK (HML-2) elements preferentially harbor motifs linked to canonical splicing and mRNA maturation. Integration with gene annotations reveals widespread incorporation of HERV sequences into transcript structures, including more than 4,000 long non-coding RNAs. Conserved retroviral protein domains within predicted open reading frames are strongly enriched in terminal exons and 3' untranslated regions, consistent with potential micropeptide-encoding capacity. In addition, we identify a subclass of lncRNAs largely composed of HERV sequence, indicating that endogenous retroviral loci have been extensively captured within annotated transcripts. Finally, we detect more than 6,500 antisense LTR insertions in transcript termini, defining widespread SPARCS-like (stimulated 3 prime antisense retroviral coding sequences) configurations with potential for double-stranded RNA formation and preferential association with immune-related genes. Together, these results establish HERV sequences as a pervasive layer of RNA regulatory potential embedded within human transcripts, highlighting previously underappreciated roles in post-transcriptional gene regulation.

11

Extended t-cores for the de novo identification of transposable elements and other inexact repeats from short read RNAseq data

Darmon, S.; Mary, A.; Lacroix, V.

2026-07-10 bioinformatics 10.64898/2026.07.06.736737 medRxiv

Top 0.1%

3.4%

Show abstract

Transcribed repeats represent a major challenge in the de novo assembly of transcriptomes from short RNA-seq reads. Young transposable elements (TEs) and other inexact repeats create dense and ambiguous regions in the assembly graph, preventing the correct assembly of transcripts. In this paper, we introduce a fully de novo method based on the discovery of dense regions in the compacted De Bruijn graph (DBG) to identify such repeats directly from short reads RNA-seq data, without requiring a reference genome or repeat database. Our approach defines the extended t-cores, subgraphs of the DBG that capture the complex topology induced by highly expressed inexact repeats appearing in RNA-seq reads. Independently of its interest for transcriptome assembly, the proposed method appears to be effective for the de novo identification of repeats in transcriptomes. After classifying cores using sequence-based motifs to distinguish simple repeats from potential TEs, we demonstrate its potential for the de novo discovery of transposable elements. We validate the approach on a Mus musculus dataset using expressed TE consensus sequences, showing that extended t-cores correspond to known expressed TE families. We also illustrate its de novo discovery potential on a non-model species, Canis lupus familiaris, where the method was also able to recover known transposable elements.

12

Establishing a Retron-Based Cytosine Base Editor for Targeted Hypermutation in Escherichia coli

Shi, X.;Ni, Y.;Tian, N.;Ruan, Q.;Liu, D.;He, J.;Wang, X.

2026-06-20 Synthetic Biology 10.64898/2026.06.18.733067 medRxiv

Top 0.1%

3.4%

Show abstract

Current cytosine base editors (CBEs) are limited to unidirectional C to T conversions, restricting their applications. Retrons, bacterial genetic elements, encode a reverse transcriptase that generates multicopy single-stranded DNA (msDNA) by reverse transcribing specific non-coding RNA (ncRNA). This msDNA mimics Okazaki fragments during DNA replication, making retrons promising for gene editing. Here, we developed a retron-based cytosine base editor (RCBE) by fusing cytosine deaminase with reverse transcriptase (RT-CDA) within the retron system. RCBE first transcribes ncRNA, allowing RT-CDA to deaminate cytosine on the ncRNA. The modified ncRNA is then reverse transcribed into msDNA, where RT-CDA induces further cytosine deamination. This mutant msDNA introduces specific mutations into target gene sequences, enabling both C to T and G to A conversions. Using RCBE, we demonstrated accelerated molecular evolution of the rpoB gene in Escherichia coli. High-throughput sequencing confirmed that RCBE achieves a mutation rate of up to 0.2% in regions with high GC content. Our findings establish RCBE as a versatile tool, particularly suitable for directed evolution in GC-rich regions, with broad potential applications across various bacterial and eukaryotic hosts.

13

Lineage-specific evolution of LTR retrotransposons under natural selection drives genomic divergence in diploid Oryza species

Gao, L.; Xu, R.-j.

2026-06-20 genomics 10.64898/2026.06.16.732638 medRxiv

Top 0.1%

3.3%

Show abstract

LTR retrotransposons (LTR-RTs) are major drivers of plant genome evolution. However, the principles governing how natural selection shapes their lineage-specific dynamics across closely related species remain elusive. Here, we performed a comprehensive comparative analysis of LTR-RTs across 15 diploid Oryza species, representing all major genome types. We reconstructed the spatiotemporal distribution and evolutionary trajectories of LTR-RTs, revealing three distinct evolutionary patterns influenced by natural selection: lineage-specific expansion under purifying selection, balanced co-evolution under balancing selection, and lineage-specific retention under strong positive selection. We further demonstrated that species-specific LTR-RT families significantly contribute to highly diverged regions (HDRs) and non-aligned regions (NOTALs), drive segmental duplications, and influence genome size variation. Notably, the removal of LTR-RTs, particularly from medium-removal-rate families, is a key factor in genome size contraction. Our study provides a population-genetic perspective on LTR-RT evolution and highlights their critical role in shaping genomic divergence and adaptation in Oryza. Our findings provide a population-genetic framework for understanding how the co-evolutionary dynamics between TEs and their hosts shape genome architecture and adaptation in closely related species.

14

Multi-layered characterization of ~700,000 conserved noncoding elements within the human genome

Fibi-Smetana, S.; Fernandez-Mendoza, F.; Taher, L.

2026-06-17 evolutionary biology 10.64898/2026.06.16.732547 medRxiv

Top 0.1%

3.1%

Show abstract

Conserved noncoding elements (CNEs) have been extensively studied for their roles as regulatory elements, particularly enhancers. However, the advent of technologies like ChIP-seq and ATAC-seq has shifted research focus away from comparative genomics. Here, we leveraged data from large-scale projects like ENCODE to address the resulting gap in the comprehensive functional characterization of CNEs. We first derived a set of [~]700,000 CNEs in the human genome from a 470-way mammalian alignment. Phylogenetic inference identified [~]670,000 conserved elements within primates and [~]240,000 conserved elements across mammals. Our functional genomic analysis revealed that, irrespective of their level of conservation, approximately one third of CNEs exhibit concurrent chromatin accessibility and H3K27 acetylation in at least one of 19 examined tissues and cell lines and thus, are likely to act as cis-regulatory elements. Extrapolating these data to additional tissues and cell lines suggested that [~]40% of the CNE repertoire possesses cis-regulatory potential. Moreover, we found that the 3D organization of CNEs is non-random; specifically, CNEs are preferentially located toward the centers of topologically associating domains. CNE co-activation networks derived from chromatin accessibility and active histone marks revealed that evolutionary constraints acting on CNEs functioning as cis-regulatory elements reflect not only their isolated individual role, but their topological context. To summarize, we have generated a novel catalog of CNEs annotated with empirical cis-regulatory evidence. While evolutionary constraint and regulatory function are clearly linked, a comprehensive understanding of their interplay remains elusive. This resource provides a foundation for exploring this relationship systematically. Significance statementPrevious studies have investigated conserved noncoding element (CNE) evolution, epigenomic landscapes, and 3D genome organization separately, yet a systematic framework integrating these dimensions has been lacking. Here, we identified [~]700,000 CNEs, including [~]240,000 CNEs deeply conserved across mammals, and show that a large fraction display enhancer-associated epigenomic signatures and are preferentially enriched within TAD centers, highlighting their regulatory relevance. By generating and analyzing this CNE catalog within an integrated evolutionary, epigenomic, and 3D genomic context, our study bridges a critical gap and provides a comprehensive resource to better understand regulatory architecture and its potential contribution to disease-associated variation.

15

Fly Viral Atlas: A single-nucleus transcriptomic atlas of RNA viruses and transposable elements (TEs) in Drosophila melanogaster

Roy, N.; Unckless, R. L.

2026-07-01 genomics 10.64898/2026.06.28.735102 medRxiv

Top 0.1%

2.7%

Show abstract

Drosophila RNA viruses often persist in wild and lab populations, yet their tissue and cellular tropism is poorly understood. In the Fly Cell Atlas (a comprehensive Drosophila single-nucleus transcriptome) data, we detected four RNA virus infections: Nora virus, Drosophila A virus, Drosophila C virus, and Newfield virus. Nora and Drosophila A virus were the most abundant and widespread across tissues and cell types, while Drosophila C virus and Newfield virus RNA transcript were only found in oenocyte and fat body tissues. We found transcriptional changes associated with viral infection in canonical viral immunity genes (e.g. Vago, vir-1). Additionally, we observed that during persistent viral infections, transposable element (TE) transcripts were upregulated in somatic cells. TEs are traditionally associated with the germline, but recent studies and our data suggest they are also expressed in somatic cells. Using the Fly Cell Atlas data, we found that distinct somatic cell types express specific TE subtypes, indicating regulated and cell-type specific TE activity often overlooked in transcriptomic studies. We present Fly Viral Atlas (https://flyviralatlas.shinyapps.io/home/), a single-nucleus level atlas of RNA viruses and TE expressions in Drosophila, providing new insights into viral tropism and TE dynamics across cell types and tissues.

16

Abundance, diversity and activity of endogenous retroviruses in the slow loris.

Michie, C. A. G.; Free, H. B.; Nijman, V.; Kanda, R. K.

2026-06-30 genomics 10.64898/2026.06.25.734490 medRxiv

Top 0.1%

2.6%

Show abstract

Endogenous retroviruses (ERVs) constitute a significant fraction of vertebrate genomes and serve as genomic records of past retroviral infections, while also influencing host biology through regulatory co-option and, in some cases, ongoing retrotransposition. Despite extensive examination of ERVs in haplorrhine primates, equivalent analyses in strepsirrhines remain absent, leaving a substantial gap in our understanding of ERV diversity and evolutionary dynamics across the primate order. Here, we present the first comprehensive characterisation of ERVs in a strepsirrhine primate, identifying 15 Loris Endogenous Retrovirus (LERV) families encompassing 34 subfamilies and over 6,000 insertions in the Nycticebus coucang reference genome. Phylogenetic analyses resolved LERVs into three retroviral genera: betaretroviruses (LERV1-4), type-D betaretroviruses (LERV5-9), and gammaretroviruses (LERV10-15). LERV2a shows multiple hallmarks of recent or potentially ongoing retrotransposition, including a median insertion age of zero, a high proportion of identical LTR pairs, dN/dS ratios comparable to the active retrovirus HTLV, and insertional polymorphism between two conspecific genomes. Comparative genomic screening across Lorisidae revealed that LERV subfamily distribution broadly mirrors estimated insertion ages, with progressively fewer subfamilies detected in more distantly related species. These findings establish a detailed foundation for understanding retroviral evolution in Strepsirrhini and reveal that ongoing retroviral activity is not restricted to haplorrhine primates.

17

Evidence for independent retroviral syncytin-like Env endogenization in non-placental chondrichthyans

Proudley, E.; Reddin, I. G.; Cleal, J. K.; Lewis, R. M.; Laundon, D.

2026-05-07 evolutionary biology 10.64898/2026.05.06.723177 medRxiv

Top 0.1%

2.6%

Show abstract

Viviparity and placentation are remarkable examples of convergent evolution across vertebrates. The evolution of the uniquely intimate mammalian placenta has been associated with the repeated independent capture of fusogenic retroviral Env proteins, called syncytins. Research into syncytin capture has therefore been predominantly focused on resolving their central role in mammalian placentation. As such, the presence of syncytin-like Env proteins outside of mammals, and their role in non-placental physiological contexts, remain much less understood. We expanded this understanding by systematically surveying genomes from 36 chondrichthyan species (sharks, rays, skates, and chimaeras), which display a wide range of independently evolved placental and non-placental reproductive strategies, for the presence of syncytin-like Env genes. We identified 295 candidate syncytin-like Env proteins from 16 chondrichthyan species, with a subset displaying conserved fusogenic domains, structural homology with known syncytins, and genomic signatures of endogenization. Using transcriptomic data from the model catshark Scyliorhinus canicula, we found that syncytin-like Env genes are transcriptionally active in diverse adult tissue types. Using two closely related species of Squalus (spiny dogfish), we present evidence that endogenized Env genes are syntenically conserved, indicative of vertical transmission from a common ancestor before species divergence. Notably, we detected no candidates in any placental shark genome, suggesting that syncytin-like Env capture is not a feature of shark placentation. Our findings expand the known phylogenetic breadth and functional scope of syncytin-like Env protein endogenization beyond mammalian placentation, providing a solid foundation for future investigations into the wider role of retroviral capture in vertebrate biology and evolution.

18

LVV SMRTcap reveals extensive proviral variation in lentiviral vector-transduced CAR T cells

Kaiser, C.; Sadri, G.; Elliott, E. M.; Mroczkowska, J. J.; Ankita, J.; Ferguson, M.; Bushman, F.; Fraietta, J. A.; Rouchka, E. C.; Smith, M.

2026-05-15 cancer biology 10.64898/2026.05.13.724601 medRxiv

Top 0.1%

2.4%

Show abstract

Lentiviral vectors are commonly used to introduce chimeric antigen receptor transgenes into T cells, but routine assays quantify vector copy number or integration sites without sequencing full-length integrated vectors. HIV-1 proviruses often acquire large deletions and cytidine deaminase-driven hypermutation; whether similar variation occurs in therapeutic lentiviral vectors is unclear. We adapted a novel long-read capture approach to enrich long fragments spanning vector DNA and adjacent human sequence, enabling simultaneous integration-site mapping and proviral integrity analysis with single-molecule resolution. In research-grade CAR T cells produced with an experimental, transient-transfection lentiviral vector workflow, 40% of integrated vectors carried recurrent deletions that removed the internal promoter or parts of the chimeric antigen receptor cassette. The dominant promoter deletion was present in the viral stock. In clinical chimeric antigen receptor T cell products, promoter deletions were less frequent, but detectable pre-infusion and post-infusion. Across datasets we observed widespread G-to-A substitutions consistent with restriction factor editing, including changes predicted to introduce premature stop codons within the transgene open reading frame. Our method reveals proviral variants invisible to standard quality-control assays and provides a framework to improve vector production and monitor transgene integrity in clinical products.

19

kmerRRR: A k-mer based tool for functional genomics in Repeat Rich Regions

Rahmat, J.; Pham, T. M.; Larracuente, A. M.

2026-06-25 genomics 10.64898/2026.06.21.732238 medRxiv

Top 0.1%

2.4%

Show abstract

Highly repetitive sequences pose problems for genome assembly and analysis. While advances in long-read sequencing technologies have helped reveal the organization of repetitive genomic sequences at unprecedented resolution, their functional characterization remains difficult because molecular assays that probe protein-DNA interactions and characterize expression often rely on short read sequencing. The repetitive nature of these regions poses major challenges for methods relying on sequence mapping, which is exacerbated for short reads. Repetitive genome regions often have low mappability, leading to substantial information loss during downstream filtering. To address this challenge, we developed a bioinformatic tool--kmerRRR--that leverages k-mer frequency analyses to enhance the mappability of repetitive regions. KmerRRR compares k-mer frequencies within user-defined loci to their frequencies across the genome to identify repetitive sequences that are overrepresented locally relative to the global background. This approach quantifies locus uniqueness, allowing users to distinguish sequences that are globally repetitive from those that are repetitive, but restricted to specific genomic loci. We demonstrated the utility of this method by reanalyzing chromatin profiling data from human, Drosophila, and Arabidopsis centromeres and small RNA sequencing data. Our results show that incorporating local k-mer ratio information enhances read retention and signal interpretation within repetitive regions, thereby recovering biologically meaningful information that is typically lost in conventional analyses. The tool is freely available under MIT license in github: (https://github.com/LarracuenteLab/kmerRRR).

20

Transcriptional Characterization of Nuclear-Integrated Organellar DNA in Populus

Arneson, R.; Wittstock, W.; Marceau, A.; Yuan, Y.

2026-07-12 genomics 10.64898/2026.07.08.737317 medRxiv

Top 0.1%

2.3%

Show abstract

The continuous transfer of organellar DNA into the nuclear genome during eukaryotic evolution has resulted in the widespread occurrence of nuclear plastid DNA insertions (NUPTs) and nuclear mitochondrial DNA insertions (NUMTs). However, their functional significance in nuclear gene expression and genome evolution remains largely unresolved. In this study, we employed Oxford Nanopore Direct RNA Sequencing (DRS) to investigate the transcription of NUPTs and NUMTs in the Populus nuclear genome and compared their transcriptional characteristics with their genome-wide insertion patterns. Our analyses revealed that the majority of transcribed NUPTs and NUMTs are enriched within introns and are co-transcribed with their host or adjacent genes in polycistronic-like transcriptional units. In addition, NUPTs and NUMTs frequently generate intronless transcripts, features reminiscent of their prokaryotic ancestry. We further identified a putatively functional NUPT-derived psbH gene that is unique to P. trichocarpa, providing new insights into the evolution of nuclear-encoded organelle-targeted genes. In addition, we identified transcribed NUPT and NUMT insertion polymorphisms among alleles, suggesting that organellar DNA insertions contribute to allelic variation and may participate in environmental adaptation. Collectively, our findings reveal previously unrecognized roles of NUPT and NUMT transcription in gene regulation, allelic variation, genome evolution, and the emergence of novel genes.